Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Quality of life improvements #195

Merged
merged 7 commits into from
Feb 2, 2024
Merged

Conversation

duttonw
Copy link
Collaborator

@duttonw duttonw commented Nov 1, 2023

  • Add Dependabot to provide updated library's and kick of testing on our behalf
  • Make flake8 ignore less rules to improve coding standards
  • Increase timeouts for non-imported records to handle super large datasets (1million + rows)
  • Re add backward compatibility: If ckanext.xloader.use_type_guessing is not configured, fall back to
    deprecated config option ckanext.xloader.just_load_with_messytables
  • Include chardet 5.2.0 UniversalDetector to better sniff the file encoding to stop failed direct loads where confidence is over 70%
    Fallback to ISO_8859 which covers majority of Windows-1252
  • Better handle empty timestamp and numeric columns, direct load fails to import if its an empty string vs required None object.
  • Direct Import Load did not handle hh:mm correctly and imported them as time delta of current import time.
    So we try to convert cells to numbers or timestamps if applicable:
    If a list of types was supplied, use that.
    If not, then try converting each column to numeric first, then to a timestamp.
    If both fail, just keep it as a string.
  • If resource is_resource_supported_by_xloader. Add nav button to datastore so you don't need to visit the 'manage/edit' to allow
    when auth allows (Resource uploader and above)
  • Performance Improvement: On DataStore Tab of resource, Truncate very long logs:
    Only show the start and end (default first 50 and last 50) with ability to request more rows or all rows if so desired.

As well as lots of extra tests for edge conditions

* Performance Improvement: Skip rows when showing datastore load when handing multi million line records
* Improve testing on mixed formatting where type loading causes bad loads when it has a empty field on type timestamp/numeric, 'can't be string' must be none
* Include shortcut to datastore when loaded for Package/Resource maintainers+
* Include dependabot pr auto raising to help keep current
* Make flake8 more stringent
* [QOLDEV-347] apply 'str' fallback type correctly
* [QOLDEV-347] fix validation errors on empty strings
* [QOLDEV-424] set default CSV sample size in config to match previous product 1000 lines
* [QOLDEV-424] handle parsing CSV file with commas inside quotes better
…fidence

Also have fallback to windows encoding if all else fails
@duttonw duttonw marked this pull request as ready for review November 1, 2023 22:55
- Windows-1252 is a superset, which makes it more useful for this purpose
@duttonw
Copy link
Collaborator Author

duttonw commented Nov 7, 2023

Hi @kowh-ai , @smotornyuk
Just wondering what your capacity is for looking over this improvement PR.

Regards,

@duttonw

ThrawnCA and others added 3 commits January 29, 2024 16:35
- Recognise 0 as a valid numeric value
- Fix whitespace and unused import
- Extract maximum retry count to a constant
- Use context managers to automatically close streams
- Add README note about configuring PostgreSQL date style
- Add titles to queued jobs so they are more easily administered
@ThrawnCA ThrawnCA merged commit 2b26209 into ckan:master Feb 2, 2024
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants